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(57) ABSTRACT 

An apparatus is described that comprises a data processing 
unit and at least one coprocessor. The data processing unit 
comprises a register file having registers, a memory, a 
plurality of execution units, a coprocessor interface for 
coupling the at least one coprocessor with the data process- 
ing unit, and a pipeline configuration for processing instruc- 
tions having a fetch stage for fetching an instruction from the 
memory, a decode stage for decoding an operational code 
from the instruction, an execution stage for activating one of 
-the execution units,-and a-write-back stage for- God- writing 
back from the execution unit. The data processing unit 
comprises read-anci write-lines coupling the register file with 
the coprocessor for exchanging operands, at least one con- 
trol line indicating that the coprocessor is busy, and a 
plurality of control lines from the decode stage for control- 
ling the coprocessor. which are operated upon detection of a 
coprocessor instruction. The coprocessor is using the regis- 
ters from the register file during execution of the coproces- 
sor instruction. The coprocessor comprises a decode unit for 
decoding the coprocessor instruction and a plurality of 
coprocessor execution units that share the decode unit, the 
decode unit selects one of the coprocessor execution units 
upon the coprocessor instruction, and the selected one of the 
coprocessor execution units performs the coprocessor 
instruction. 

13 Claims, 2 Drawing Sheets 
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DATA PROCESSING UNIT WITH which are operated upon detection of a coprocessor instruc- 

INTERFACE FOR SHARING REGISTERS BY tion. The coprocessor is using the registers from the register 

A PROCESSOR AND A COPROCESSOR file during execution of the coprocessor instruction. The 

coprocessor comprises a decode unit for decoding the copro- 

BACKGROUND OF THE INVENTION 5 cessor instruction and a plurality of coprocessor execution 

THe present invention relates to a data processing unit unit f lhat share the decode unit, the decode unit selects one 

with a coprocessor interface. A coprocessor & used in a data of the ^processor execution unite upon the coprocessor 

processing system to perform special tasks, such as floating ™uuclion, ™d the selected one of the coprocessor execu- 

point operations, digital signal processing, etc. Many data Uon units P erforas the ^processor mstructton. 

processors are capable of working in combination with a 10 Because the coprocessor is using the register file of the 

coprocessor. Usually, a main processor addresses a copro- main processor it can execute instructions as fast as any 

cessor through the system bus. If the main processor decodes execution unit, such as the anthmetic logic unit, a shifter, a 

a coprocessor instruction, it transfers, for example by means load/store unit, etc. A coprocessor instruction is decoded and 

of an exception routine, the coprocessor instruction and executed in the same manner as any other instruction, 

respective data to a coprocessor which performs the instruc- 15 In a further embodiment a field programmable gate array 

tion and transfers back a result to the main processor. During (FPGA) is used as a coprocessor. Thus, a wide variety of 

execution of the coprocessor, the main processor usually is additional instructions can be executed, whereby the instruc- 

set in a wait state. tion variety can be expanded dynamically by means of 

U.S. Pat. No. 5,603,047 describes such a system. FIG. 7 2Q re-programming the FPGA. 

of U.S. Pat. No. 5,603,047 shows a block diagram of such BRIEF DESCRIPTION OF THE DRAWINGS 
a coprocessor having 24 registers. A coprocessor instruction 

has a specific format which is detected during the decode FIG. 1 shows a block diagram of the relevant parts of a 

stage of the pipeline shown in FIG. 2 of U.S. Pat. No. data processing unit including a coprocessor interface 

5,603,047. The respective coprocessor instructions are according to the present invention, 

described in column 20 of the U.S. Pat. No. 5,603,047. They FIG. 2 shows the format of a coprocessor instruction, 

include instructions for loading and storing data and control FIG. 3 shows a block diagram of an embodiment of a 

from or to the coprocessor. The coprocessor can be able to single coprocessor, and 

perform a variety of functions which might be selected by FIG. 4 shows a block diagram of an embodiment of four 

various programs which can be selected through respective 3Q coprocessors, 
addresses which are transferred to the coprocessor. The 

coprocessor executes these -programs and when finished, the -DETAILED DESCRIPTIGNOF THE 

respective results can be transferred to the main processor PREFERRED EMBODIMENT 

through respective traasfer instructions. FIG. 1 shows a memory cache subsystem 1 coupled 

SUMMARY OF THE INVENTION * mrou g D a bus ^ th ^ 2 Register file 2 contains 

an align unit 201, address buffer 202 and data buffer 207, 

In many applications high speed processing of data is address registers 203 and data registers 208, address for- 

necessary. Therefore, there exist a high demand of perform- warding unit 204 and data forwarding unit 209, address 

ing certain tasks within a single cycle of the system clock. write-back buffer 205 and data write-back buffer 210, and a 

Most instructions of known microprocessors or microcon- 40 control unit 206. In the preferred embodiment only the data 

trollers can be executed within a single cycle due to super- registers are interfaced with the coprocessor. Therefore, only 

scalar and superpipeline techniques. Nevertheless, many the most relevant connecting lines are shown in FIG. 1 for 

special instructions are either not available on, for example, tne 0 f a better overview. Nevertheless any kind of 

reduced instruction set computers, or need a plurality of register from a register file can be used to interface with the 

execution cycles. Even with the addition of coprocessors 45 coprocessor interface. The data registers 208 are coupled 

these tasks cannot be executed in the requested time due to through data buffer 207 and align unit 201 with the cache 

cumbersome transfer protocols between the main processor memory subsystem 1. 

and a coprocessor. To interface with the different execution units 3a, . . . 3n 

Therefore, it is an object of the present invention to three different read busses are provided. The first read bus 

provide a data processing unit with a coprocessor interface 50 211 comprises 64 bit lines, the second read bus 212 has 32 

to overcome the above mentioned problems. bit lines, a nd the third read bus 213 provides also 32 bit lines. 

This object is achieved according to the present invention Of course the number of bit lines per read port is freely 

by an apparatus that comprises a data processing unit and at selectable and depends on the instruction set. Furthermore, 

least one coprocessor. The data processing unit comprises a a write bus 214 having 64 bit lines is provided. These four 

register file having registers, a memory, a plurality of 55 busses 211, 212, 213, and 214 allow read and write access 

execution units, a coprocessor interface for coupling the at to the respective data registers 208 of the register file 2. An 

least one coprocessor with the data processing unit, arid a instruction fetch unit 5 provides instructions to a following 

pipeline configuration for processing instructions having a instruction decoder 6. The instruction decoder 6 provides all 

fetch stage for fetching an instruction from the memory, a execution units with respective operational codes and selects 

decode stage for decoding an operational code from the 60 the respective registers 203, 208 in the register file 2. A 

instruction, an execution stage for activating one of the coprocessor interface 7 is provided which is coupled with 

execution units, and a write-back stage for writing back from the four busses 211, 212, 213, and 214. Furthermore, copro- 

the execution unit. The data processing unit comprises read- cessor interface 7 is coupled through busses 61 and 62 with 

and write-lines coupling the register file with the coproces- instruction decoder 6. Bus 61 can have n instruction lines for 

sor for exchanging operands, at least one control line indi- 65 providing operational code and other information. In, 

eating that the coprocessor is busy, and a plurality of control addition, bus 62 has ra control lines to provide the pipeline 

lines from the decode stage for controlling the coprocessor with status information from the coprocessors. 
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The control bus 61, 62 can have the following function- coprocessor, configuring the coprocessor and transferring 
ality; One line can indicate a valid instruction which would the respective instruction to the coprocessor. This creates an 
be asserted when the integer pipeline is valid. Another line overhead affecting the overall speed of the system. Thus, a 
or set of lines could be provided for an instruction sequencer. known coprocessor will stall the respective pipelines for a 
Depending on the number of instruction cycles needed a 2 5 plurality of cycles. The coprocessor according to the present 
bit, 3bit, 4 bit, etc., -wide bus would be provided. A further invention does not need these steps. It can operate directly 
line can indicate a multi cycle start which would be activated with the register file of the main CPU. Transfer of data is 
by the coprocessor to indicate when the instruction in the similar to the transfer of data to regular execution units. Thus 
coprocessor decoder is a multi cycle instruction. Yet another every instruction which can be executed in a single cycle can 
line would be activated by the coprocessor to indicate the 1Q be executed in parallel with another pipeline or multiple 
end of a inuJti cycle instruction, signaling the last re-inject pipelines. In the embodiment of FIG. 1 this would be the 
of the instruction. Also, a multi cycle continue control line load/store pipeline coupled with the address register file 203, 
can be provided which would be activated by the coproces- and the units 202, 204, 205. The pipelines only get stalled 
sor to re -inject an instruction during multi cycle start and end with a multi-cycle instruction in a similar manner as this 
phase. To indicate an invalid opcode a further control line would occur with any execution unit of the central process- 
may be provided. Further control lines indicate which copro- 15 ing unit. For this purpose, control lines indicating a multi - 
cessor has to be enabled, for example, two lines can address cycle start, a multi-cycle end, and a multi-cycle continuation 
four different coprocessors. Other control signals may be described above are used. 

provided depending on the structure of the coprocessor unit. Using a FPGA as a coprocessor comprises additional 

The embodiment according to FIG. 1 shows three copro- benefits. Depending on the specific task a microcontroller 

cessors. The number of coprocessors which can be added to 20 system using a data processing unit according to the present 

the system internally or externally depends on the instruc- invention is programmed initially. The FPGA may be 

tion size of the data processing unit as will be explained re -programmed and adapted to each specific task of a 

later. The first coprocessor 4a in this embodiment shows a complex program dynamically. For example an instruction 

floating point coprocessor. The second coprocessor 46 is a for performing a convolution operation is not available in 

fuzzy logic coprocessor and the third coprocessor is a 25 standard instruction sets of either a RISC or a CISC pro- 

re-programmable coprocessor in form of a FPGA. All copro- cessor. Such an instruction forms, for example, a 32 bit long 

cessors are coupled with the six busses 211,212,213,214, 61, word out of two 16 bit words by alternatively concatenating 

and 62 through interface 7. the bits of each input word. For example, if the first input 

FIG. 2 shows two possible formats A and B of a copro- word contains only "1111 . . . Ill" and the second input 

cessor instruction. In this embodiment an instruction is 32 30 word contains only "0" the result would be a 32 bit word 

bits long _and the bit fieldsindicating a coprocessor instruc- with alternating. "0" and_"l". .In other .words, l he_ resulting 

tion can be one or both of the opcode fields OPCODE 1, word consists of bit 16 of the first word, followed by bit 16 

OPCODE 2, and OP 1, OP 2, respectively. The bit field D of the second word, followed by bit 15 of the first word and 

indicates the destination in form of a register number where so on. To perform such an operation a plurality of instruc- 

the result of the respective instruction will be written to. The 35 tions has to be executed in a conventional microprocessor 

bit field # indicates the number of the coprocessor for system. A FPGA can be easily programmed to couple a 

executing the instruction defined in the opcode bit field. Bit multiplexer or respective logic with the input and output 

fields SI, S2, S3 contain either data register or immediate lines to perform this task in a single cycle. Because such an 

data for the respective instruction. In this embodiment each instruction can be performed with the registers of the data 

of the bit field SI, S2, S3, and D are 4 bits wide, the 40 processing unit no additional transfers are necessary. 

OPCODE field comprises 12 or 16 bits. The # field has 2 The embodiment of a coprocessor interface according to 

bits, and the 2 bits are not used in both instruction formats the present invention provides three data read busses 211, 

A and B indicated as 212, and 213 and one write-back bus 214. Thus, digital 

Instruction fetch unit 5 provides instruction decoder 6 signal processing functionality can be provided by the 

with an instruction from a instruction stream. Instruction 45 coprocessors. For example, a single instruction can perform 

decoder 6 determines whether an instruction is designated to a multiplication of two operands and an addition of the result 

a coprocessor by means of the bit field OPCODE 1, with a third operand. The final result is written into a 

OPCODE 2, and OP 1, OP 2, respectively. After decoding of designated register. All three operands can be transferred 

an instruction the coprocessor indicated in the bit field # during the decode cycle to the respective coprocessor and 

receives the respective instruction stored in the opcode bit 50 written back to the destination register during the write-back 

fields and eventually immediate data from one or more of the cycle. 

bit fields SI, S2, S3 through bus 61 and the contents of the FIG. 3 shows the main blocks of a coprocessor 4 coupled 

selected data registers in bit fields SI, S2, and S3 through the with a coprocessor interface according to the invention, 

three data read busses 211, 212, and 213. In the following Each coprocessor may have a decode unit 41 which receives 

execution cycle the coprocessor executes the instruction 55 the respective coprocessor instruction from the CPU. 

decoded by the instruction decoder and writes during the Decode unit 41 decodes the instruction, for example, bits 16 

write -back cycle the respective result back to a data register to 23 according to an instruction as shown in FIG. 2, Then, 

designated in bit field D. Thus, execution of a coprocessor decode unit 41 provides an execute unit 42 coupled with 

instruction can be as quick as an execution of any execution decode unit 41 with the respective control signals. Execute 

units. No transfers from or to registers are delaying the 60 unit may contain multiplexers, adders, shifter, etc. connected 

process of executing a special instruction because the in a way to perform respective functions. The control signals 

respective coprocessor does not need its own registers. provided by decode unit 41 activate the respective units to 

Nevertheless, a coprocessor may have additional registers operate in a predetermined way. The result is passed to the 

which contain data that need not be accessible by the data coprocessor interface, which couples the result bus to the 

processing unit. 65 write back bus of the integer pipeline. Thus, the coprocessor 

On the other hand, usually a common known coprocessor behaves in a similar way as an additional execution unit as 

needs to be initialized by transferring data to the shown in FIG. 1. 
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FIG. 4 shows a solution where multiple execution units 
43, 44, 45, and 46 share the same decode unit 41. Decode 
unit 41 decodes the respective coprocessor instruction and 
selects one of the execution units 43, 44, 45, or 46 which 
performs the respective function. The result is again written 
back through interface 7 into the register file. 

In case of a longer execution time needed by a 
coprocessor, the pipeline of the data processing unit needs to 
be stalled. Thus, additional control lines 62 are provided 
which supply information from the coprocessors to the 
pipeline as described above. For example, the coprocessor 
executing a respective instruction which needs a plurality of 
system cycles sends a busy signal through bus 62 to the 
instruction decode unit 6 to stall the pipeline. 

The coprocessor interface includes all necessary buffers 
and logic to feed necessary signals from or to the coproces- 
sors. Thus, tbe coprocessors according to the present inven- 
tion can be coupled with tbe coprocessor interface 7 either 
on-chip or externally. In the preferred embodiment the 
coprocessors are coupled with the integer pipeline. In dif- 
ferent embodiments with different pipeline structures the 
coprocessor interface can also be coupled with a different 
type of pipeline or with more than one pipeline. Thus, two 
or more coprocessors could operate in parallel. 

What is claimed is: 

1. Apparatus comprising: 

a) a data processing unit including 

1) a register file having registers, 

2) a memory, 

3) a first bus coupling said register file with said 
memory, ______ . . - - 

4) a plurality of execution units, 

5) a pipeline configuration for processing instructions 
having a fetch stage for fetching an instruction from 
said memory, a decode stage for decoding an opera- 
tional code from said instruction, an execution stage 
for activating one of said execution units, and a 
write-back stage for writing back from said execu- 
tion unit; 

b) a coprocessor; 

c) a coprocessor interface for coupling said coprocessor 
with said data processing unit; 

d) a second bus including read-lines coupling said register 
file with said plurality of execution units and said 
coprocessor; 

e) a third bus including write-lines couplings said register 
file with said plurality of execution units and said 
coprocessor, the second and third buses exchanging 
operands between said registers and said plurality of 
execution units and between said registers and said 
coprocessor; 

f) at least one control line from said coprocessor to said 
pipeline configuration for indicating that said copro- 
cessor is busy; and 
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g) a plurality of control lines from said decode stage in 
said data processing unit to said coprocessor to provide 
said operational code to said coprocessor, the plurality 
of control lines operated upon detection by the decode 

5 stage that the instruction is a coprocessor instruction; 

h) whereby said coprocessor uses said registers from said 
register file during execution of a coprocessor instruc- 
tion. 

2. Apparatus according to claim 1, wherein said read- and 
write-lines include a plurality of read tines to read at least 
two operands from said register file and a plurality of write 
lines to write-back at least one operand. 

3. Apparatus according to claim 1, wherein each instruc- 
15 tion contains a bit field for use by the decode stage to 

determine whether the instruction is a coprocessor instruc- 
tion and a bit field indicating the operational code for said 
coprocessor, 

4. Apparatus according to claim 1, wherein a pipeline 
20 execution is stalled upon a busy signal from said coproces- 
sor. 

5. Apparatus according to claim 1 further comprising 
programming means for programming a programmable gate 
array and wherein said coprocessor is formed by a program- 

25 mable gate array. 

6. Apparatus according to claim 1 further having a control 
line that is capable of being activated by the coprocessor to 
indicate a multi cycle start when an instruction in the 

30 coprocessor is a multi cycle instruction. 

7. Apparatus according to claim 6 further having a control 
line that is capable-of being activated by the coprocessor to 
indicate an end of a multi cycle instruction. 

8. Apparatus according to claim 1 further having a control 
35 line that is capable of being activated by the coprocessor to 

re -inject an instruction to the data processing unit during a 
multi-cycle start and end phase. 

9. Apparatus according to claim 1, wherein said copro- 
cessor includes a coprocessor configured to perform a con- 

40 volution operation in a single cycle. 

10. Apparatus according to claim 1, wherein said copro- 
cessor includes a fuzzy logic coprocessor. 

11. Apparatus according to claim 1 wherein said copro- 
cessor includes coprocessor registers not accessible by the 

45 data processing unit. 

12. Apparatus according to claim 1, wherein the copro- 
cessor comprises a decode unit for decoding said coproces- 
sor instruction and at least one execution unit for executing 
said coprocessor instruction. 

13. Apparatus according to claim 1, wherein the copro- 
cessor comprises a plurality of execution units and said 
decode unit selects one of the execution units upon said 
coprocessor instruction. 

***** 
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